Automatic Removal of Marginal Annotations in Printed Text Document

نویسندگان

  • Abdessamad Elboushaki
  • Rachida Hannane
  • P. Nagabhushan
  • Mohammed Javed
چکیده

Recovering the original printed texts from a document with added handwritten annotations in the marginal area is one of the challenging problems, especially when the original document is not available. Therefore, this paper aims at salvaging automatically the original document from the annotated document by detecting and removing any handwritten annotations that appear in the marginal area of the document without any loss of information. Here a two stage algorithm is proposed, where in the first stage due to approximate marginal boundary detection with horizontal and vertical projection profiles, all of the marginal annotations along with some part of the original printed text that may appear very close to the marginal boundary are removed. Therefore as a second stage, using the connected components, a strategy is applied to bring back the printed text components cropped during the first stage. The proposed method is validated using a dataset of 50 documents having complex handwritten annotations, which gives an overall accuracy of 89.01% in removing the marginal annotations and 97.74% in case of retrieving the original printed text document.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Automatic Processing of Document Annotations

A common authoring technique involves making annotations on a printed draft and then typing the corrections into a computer at a later date. In this paper, we describe a system that goes some way towards automating this process. The author simply passes the annotated documents through a sheetfeed scanner and then brings up the electronic document in a text editor. The system then works out wher...

متن کامل

Related Documents Search Using User Created Annotations

We often use various services for creating bookmarks, tags, highlights and other types of annotations while surfing the Internet or when reading electronic documents as well. These services allows us to create a number of types of annotation that we are commonly creating into printed documents. Annotations attached to electronic documents however can be used for other purposes such as navigatio...

متن کامل

LAMP - TR - 129 CS - TR - 4781 UMIACS - TR - 2006 - 06 January 2006 HANDWRITING IDENTIFICATION , MATCHING , AND INDEXING IN NOISY DOCUMENT IMAGES

Throughout history, handwriting has been the primary means of recording information that is persevered across both time and space. With the coming of the electronic document era, we are challenged with making an enormous amount of handwritten documents available for electronic access. Though many handwritten documents contain only handwriting, now, more are mixed with printed text, noise, and b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1408.2015  شماره 

صفحات  -

تاریخ انتشار 2014